|
Data analysis method for parallel DHP based on Hadoop
YANG Yanxia, FENG Lin
Journal of Computer Applications
2016, 36 (12):
3280-3284.
DOI: 10.11772/j.issn.1001-9081.2016.12.3280
It is a bottleneck of Apriori algorithm for mining association rules that the candidate set
C
2 is used to generate the frequent 2-item set
L
2. In the Direct Hashing and Pruning (DHP) algorithm, a generated Hash table
H
2 is used to delete the unused candidate item sets in
C
2 for improving the efficiency of generating
L
2. However,the traditional DHP is a serial algorithm, which cannot effectively deal with large scale data. In order to solve the problem, a DHP parallel algorithm, termed H_DHP algorithm, was proposed. First, the feasibility of parallel strategy in DHP was analyzed and proved theoretically. Then, the generation method for the Hash table
H
2 and frequent item sets
L
1,
L
3-
L
k was developed in parallel based on Hadoop, and the association rules were generated by Hbase database. The simulation experimental results show that, compared with the DHP algorithm, the H_DHP algorithm has better performance in the processing efficiency of data, the size of the data set, the speedup and scalability.
Reference |
Related Articles |
Metrics
|
|